{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f00a3ace",
   "metadata": {},
   "source": [
    "# Pandas\n",
    "\n",
    "hvPlot has been designed as a simple plotting interface (one-line call is enough for most cases) to [many data libraries](../../data_libraries.ipynb). It was greatly inspired by Pandas' original plotting interface, that is mostly a convenient interface to Matplotlib's plotting API, and allows in fact to pass many arguments directly to Matplotlib. On the other hand, hvPlot's plotting interface is a convenient interface to [HoloViews](https://holoviews.org/), that similarly allows to pass many arguments directly to HoloViews. Matplotlib and HoloViews are different types of visualization library, the former being a pure plotting tool (i.e. it knows how to draw pixels on your screen) while the later is more of a data exploration tool. These differences explain some of the differences you will observe between Pandas' and hvPlot's plotting APIs.\n",
    "\n",
    "Pandas offers a mechanism to [register a third-party plotting backend](https://pandas.pydata.org/docs/user_guide/visualization.html#plotting-backends). When registered, `<DataFrame|Series>.plot()` calls  are delegated to the third-party tool. hvPlot has implemented the required interface to be registered as Pandas' plotting backend. As a consequence there are two main ways to generate hvPlot plots from Pandas objects:\n",
    "- By importing `hvplot.pandas` and using the `.hvplot()` namespace (recommended).\n",
    "- By registering hvPlot as Pandas' plotting backend and using Pandas' `.plot()` namespace.\n",
    "\n",
    ":::{note}\n",
    "Pandas does not force third-party plotting tools like hvPlot to implement all of its plotting methods. It also does not enforce each method to implement specific arguments.\n",
    ":::\n",
    "\n",
    "As a summary about hvPlot's compatibility with Pandas' plotting API:\n",
    "- hvPlot can be registered as Pandas plotting backend.\n",
    "- As an convenient interface to HoloViews and not to Matplotlib, hvPlot does not aim to be 100% compatible with Pandas' API. However, Pandas users will find the plotting methods they are used to, and most of the generic arguments they accept. In a sense, hvPlot aims more for familiarity than compatibility.\n",
    "\n",
    "For a more in-depth comparison between Pandas and hvPlot APIs, visit the [Pandas API](./Pandas_API.ipynb) reference that recreates the [Pandas chart visualization guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) using both APIs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3cf3e639",
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2df838be",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = hvplot.sampledata.penguins('pandas')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85adbf89",
   "metadata": {},
   "source": [
    "## Switch Pandas backend to hvPlot\n",
    "\n",
    ":::{hint}\n",
    "This approach is an easy way (one-line change) to convert some code from generating plots with Pandas & Matplotlib to Pandas & hvPlot and see whether you like the output or not. Generally, we recommend installing the `hvplot` namespace on Pandas objects by importing `hvplot.pandas`, and invoking hvPlot via this namespace, e.g. `df.hvplot.line()`, as it can be adapted to other data libraries (e.g. if you use Dask, you can install the `hvplot` namespace on Dask objects with `import hvplot.dask`).\n",
    ":::\n",
    "\n",
    ":::{note}\n",
    "This requires `pandas >= 0.25.0`.\n",
    ":::\n",
    "\n",
    "hvPlot can be registered as Pandas' plotting backend instead of Matplotlib with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf2f51f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.plotting.backend = 'hvplot'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42ae3a9e",
   "metadata": {},
   "source": [
    "Once registered, hvPlot plots are generated when calling Pandas `.plot()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37458f8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot.scatter('bill_length_mm', 'bill_depth_mm')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "642edee7",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "To function correctly hvPlot needs to load some front-end (Javascript, CSS, etc.) content in a notebook. This is usually achieved as a side-effect of importing for example `hvplot.pandas`. In the example above, this step is done in the first cell that calls `.plot()`. It is important not to delete this cell to avoid running into hard-to-debug interactivity issues.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0bcaf23",
   "metadata": {},
   "source": [
    "\n",
    "### API comparison\n",
    "\n",
    "| Kind | In Pandas | In hvPlot | Comment |\n",
    "| - | - | - | - |\n",
    "| {meth}`hvplot.hvPlot.area` | ✅| ✅| Alpha set to 0.5 automatically in Pandas when `stacked=False`, not in hvPlot. |\n",
    "| {meth}`hvplot.hvPlot.bar`| ✅| ✅| |\n",
    "| {meth}`hvplot.hvPlot.barh` | ✅| ✅| |\n",
    "| {meth}`hvplot.hvPlot.bivariate`| ❌| ✅| |\n",
    "| `DataFrame.boxplot`| ✅| ✅|  |\n",
    "| {meth}`hvplot.hvPlot.box`| ✅| ✅| In Pandas `colors` can be used to specify the color of the components of the box plot, in hvPlot this can roughly be done via backend-specific style options. `sym` and `positions` are not supported in hvPlot. `vert` in Pandas can be replaced by `invert` in hvPlot. |\n",
    "| {meth}`hvplot.hvPlot.density`| ✅| ✅| |\n",
    "| {meth}`hvplot.hvPlot.errorbars`| ❌| ✅| Error bars can be set with `xerr` and `yerr` [in Pandas](https://pandas.pydata.org/docs/user_guide/visualization.html#plotting-with-error-bars) |\n",
    "| {meth}`hvplot.hvPlot.heatmap`| ❌| ✅| |\n",
    "| {meth}`hvplot.hvPlot.hexbin` | ✅| ✅| `reduce_C_function` in Pandas is named `reduce_function` in hvPlot.  |\n",
    "| {meth}`hvplot.hvPlot.hist` | ✅| ✅| Stacking not supported in hvPlot. hvPlot uses `invert=True` instead of `orientation='horizontal'`. Pandas' `hist` method accepts a Numpy NdArray for `by` but hvPlot does not. |\n",
    "| `DataFrame.hist` | ✅| ✅| Pandas' `DataFrame.hist()` plots the histograms of the columns on multiple subplots. hvPlot creates instead an overlay of histogram plots. To reproduce Pandas' behavior, you can set `subplots=True` to create a layout of plots (1 per column in this case), and additionally call `.cols(2)` on the object returned to lay the plots in a layout with a maximum number of 2 columns. |\n",
    "| {meth}`hvplot.hvPlot.kde`| ✅| ✅| |\n",
    "| {meth}`hvplot.hvPlot.labels` | ❌| ✅| |\n",
    "| {meth}`hvplot.hvPlot.line` | ✅| ✅| `colormap` not yet supported in hvPlot, use `color` instead. |\n",
    "| {meth}`hvplot.hvPlot.ohlc` | ❌| ✅| |\n",
    "| {meth}`hvplot.hvPlot.scatter`| ✅| ✅| |\n",
    "| {meth}`hvplot.hvPlot.step` | ❌| ✅| |\n",
    "| {meth}`hvplot.hvPlot.table`| ✅| ✅| Pandas has a [whole API](https://pandas.pydata.org/docs/user_guide/style.html) dedicated to displaying and styling tables. It also offers {func}`pandas:pandas.plotting.table` to convert a DataFrame to a Matplotlib table |\n",
    "| `pie`| ✅| ❌| Not yet implemented in HoloViews, see [this issue](https://github.com/holoviz/holoviews/issues/4800)|\n",
    "| {meth}`hvplot.hvPlot.points` | ❌| ✅| For two independent variables, useful for geographic data for examples|\n",
    "| {meth}`hvplot.hvPlot.violin` | ❌| ✅| |\n",
    "| {func}`hvplot.plotting.andrews_curves` | ✅| ✅| |\n",
    "| `autocorrelation_plot` | ✅| ❌| |\n",
    "| `bootstrap_plot` | ✅| ❌| |\n",
    "| {func}`hvplot.plotting.lag_plot` | ✅| ✅| |\n",
    "| {func}`hvplot.plotting.parallel_coordinates` | ✅| ✅| |\n",
    "| `radviz` | ✅| ❌| |\n",
    "| {func}`hvplot.plotting.scatter_matrix` | ✅| ✅| |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62ad20db",
   "metadata": {},
   "source": [
    "## Notable differences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47e4372f",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.plotting.backend = 'matplotlib'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07ef87a7",
   "metadata": {},
   "source": [
    "This section aims to describe a few of the main notable differences between Pandas and hvPlot plotting APIs. More specific differences can be found in the [Pandas API](./Pandas_API.ipynb) page that recreates the [Pandas chart visualization guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c9b85ca",
   "metadata": {},
   "source": [
    "### Figure handling\n",
    "\n",
    "A `plot` call in Pandas returns a Matplotlib `Axes` object. This object can be passed to Pandas' `plot` API via the `ax` argument, for example to overlay two different plots. The `ax` argument is not supported in hvPlot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a631281",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot = df.plot.scatter('bill_length_mm', 'bill_depth_mm', figsize=(4, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49dc13d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(plot)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "636b1120",
   "metadata": {},
   "source": [
    "hvPlot's plotting API returns HoloViews objects. These objects are wrappers around the original dataset, whose rich representation is a plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93a4b095",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot = df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', hover_cols=['species'])\n",
    "print(plot)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2483daa8",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0ddec08",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot.data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88e9a0d5",
   "metadata": {},
   "source": [
    "Using HoloViews' API, this object can be further customized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d9e56f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import holoviews as hv\n",
    "\n",
    "plot.opts(\n",
    "    height=300, width=300, color=hv.dim('species'),\n",
    "    cmap='Category10', show_legend=False,\n",
    ").hist(['bill_length_mm','bill_depth_mm'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4abdf44e",
   "metadata": {},
   "source": [
    "### Overlays and layouts\n",
    "\n",
    "In Pandas, overlays are usually created by passing down an `Axes` object to another `plot` call via the `ax` argument. Layouts are created by setting `subplots=True`, and can be customized further with the `layout` argument, or with Matplotlib's API.\n",
    "\n",
    "The approach is quite different in hvPlot as HoloViews offers some very convenient API with `*` for overlaying plots and `+` for laying out plots. Together with the `subplots` argument and HoloViews' `.cols(N)` method to limit the number `N` of plots per row, this forms an API flexible enough to handle most situations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5161059a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = df.query('species == \"Adelie\"')\n",
    "df2 = df.query('species == \"Gentoo\"')\n",
    "ax = df1.plot.scatter('bill_length_mm', 'bill_depth_mm', color=\"blue\", label=\"Adelie\")\n",
    "df2.plot.scatter('bill_length_mm', 'bill_depth_mm', color=\"green\", label=\"Gentoo\", ax=ax);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97c9a890",
   "metadata": {},
   "outputs": [],
   "source": [
    "(\n",
    "    df1.hvplot.scatter('bill_length_mm', 'bill_depth_mm', color=\"blue\", label=\"Adelie\")\n",
    "    * df2.hvplot.scatter('bill_length_mm', 'bill_depth_mm', color=\"green\", label=\"Gentoo\")\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9d4132a",
   "metadata": {},
   "outputs": [],
   "source": [
    "dft = pd.DataFrame(np.random.randn(1000, 4), columns=list(\"ABCD\")).cumsum()\n",
    "dft.plot.line(subplots=True, layout=(2, 3), figsize=(8, 6));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06df9491",
   "metadata": {},
   "outputs": [],
   "source": [
    "dft.hvplot.line(subplots=True, width=220).cols(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1841c6cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "dft['A'].hvplot.line(width=220) + dft['B'].hvplot.line(width=220)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1dc57170",
   "metadata": {},
   "source": [
    "### Plot dimensions\n",
    "\n",
    "Setting plot dimensions in Pandas is done with the `figsize` argument that accepts a tuple *(width, height)* in *inches*. `figsize` is not supported in hvPlot, instead, plot dimensions are set with the `width` (default is `700`) and `height` (default is `700`) arguments that accept integer values in pixels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3086024b",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot.scatter('bill_length_mm', 'bill_depth_mm', figsize=(4, 3));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a8b1708",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', width=350, height=250)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "924b46d6",
   "metadata": {},
   "source": [
    "### Default color cycle and colormap\n",
    "\n",
    "Pandas and hvPlot have different default color cycle and colormap.\n",
    "\n",
    "The default color cycle in Pandas is Matplotlib's `tab10` (or [\"Tableau 10\"](https://www.tableau.com/blog/colors-upgrade-tableau-10-56782)) 10-colors sequence. hvPlot's default color cycle is inherited from HoloViews and is a [custom 12-colors sequence](https://github.com/holoviz/holoviews/issues/1591)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bfc0ea9",
   "metadata": {},
   "outputs": [],
   "source": [
    "dfl = pd.DataFrame({col: [0, i+1] for i, col in enumerate('ABCDEFGHIJLKMN')})\n",
    "dfl.plot();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b140df03",
   "metadata": {},
   "outputs": [],
   "source": [
    "dfl.hvplot().opts(legend_cols=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e5ef64b",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "\n",
    "hvPlot's default color cycle can be set via HoloViews API, make sure to run this before importing the plotting extension (e.g. `hv.extension('bokeh')`, done implicitly when running `import hvplot.pandas`).\n",
    "\n",
    "```python\n",
    "import holoviews as hv\n",
    "import matplotlib\n",
    "\n",
    "hv.Cycle.default_cycles['default_colors'] = list(map(matplotlib.colors.rgb2hex, matplotlib.colormaps['tab10'].colors))\n",
    "\n",
    "import hvplot.pandas\n",
    "\n",
    "...\n",
    "```\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93791e5c",
   "metadata": {},
   "source": [
    "The default categorical colormap in Pandas is a gray scale. In hvPlot, it is [`glasbey_category10`](https://colorcet.holoviz.org/user_guide/Categorical.html#starting-colors), a colormap with 256 colors that extends Bokeh's `Category10` colormap (originally from D3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed01a6a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "categories = list('ABCDEFGHIJLKMNOPQRST')\n",
    "dfc = pd.DataFrame({\n",
    "    'x': np.random.rand(len(categories)),\n",
    "    'y': np.random.rand(len(categories)),\n",
    "    'category': categories,\n",
    "})\n",
    "dfc['category'] = dfc['category'].astype('category')\n",
    "dfc.plot.scatter('x', 'y', c='category');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f86d9bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "dfc.hvplot.scatter(\n",
    "    'x', 'y', c='category', legend='top_right'\n",
    ").opts(legend_cols=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2a99587",
   "metadata": {},
   "source": [
    "The default colormap for numerical values is `viridis` in Pandas and `kbc_r` (cyan to very dark blue) in hvPlot (see more info in [this issue](https://github.com/holoviz/holoviews/issues/3500))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c627ae70",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g']);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7a50950",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.hvplot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e938a68",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "hvPlot does not allow yet to configure globally the default colormap. The `colormap` (or `cmap`) argument can be used instead locally.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ffd27a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.hvplot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g'], cmap='viridis')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b4ffd5f",
   "metadata": {},
   "source": [
    "### Marker size\n",
    "\n",
    "The marker size in {meth}`hvplot.hvPlot.scatter` and {meth}`hvplot.hvPlot.points` plots can be controlled with the `s` argument. When converting a plot from Pandas to hvPlot, the size has to be increased to obtain an output visually similar.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1dd1145c",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot.scatter('bill_length_mm', 'bill_depth_mm', s=50, figsize=(4, 4));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a16b1f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', s=110, aspect=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "361bb8d6",
   "metadata": {},
   "source": [
    "```{toctree}\n",
    ":hidden: true\n",
    ":titlesonly: true\n",
    "\n",
    "Comparison with Pandas API <Pandas_API>\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
